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The monitoring of human behavior and traffic surveillance in various 
locations has become increasingly important in recent years. However, 
identifying abnormal activity in real-world settings is a challenging task due 
to the many different types of worrisome and abnormal actions, including 
theft, violence, and accidents. To address this issue, this paper proposes a 
new framework for deep learning-based anomaly identification in videos 
using the squirrel search algorithm and bidirectional long short-term 
memory (BiLSTM). The proposed method combines the squirrel search 
algorithm, an optimization technique inspired by nature, with BiLSTM for 
anomaly recognition. The framework uses the knowledge gained from a 
sequence of frames to categorize the video as either typical or abnormal. The 
proposed method was exhaustively tested in several benchmark datasets for 
anomaly detection to confirm its functionality in challenging surveillance 
circumstances. The results show that the proposed framework outperforms 
existing methods in terms of area under curve (AUC) values, with a test set 
AUC score of 93.1%. The paper also discusses the importance of feature 
selection and the benefits of using BiLSTM over traditional unidirectional 
long short-term memory (LSTM) models for anomaly detection in videos. 
Overall, the proposed framework provides a highly precise computerization 
of the system, making it an effective tool for identifying abnormal human 
behavior in surveillance footage. 
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1. INTRODUCTION 


Surveillance is the method used to keep an eye on people's behavior and actions to manage and 
safeguard them. Due to its wide variety of uses, particularly wide-area surveillance and health monitoring, 
automatic detection of anomalous occurrences in video has received much attention recently. Because an 
anomaly is unknown beforehand and can result from strange behaviors or activities performed in unfamiliar 
situations, this challenge differs from event detection, where the event is specified. Using internet of things 
(IoT) gadgets like closed-circuit television (CCTV) cameras is the most common way to observe the number 
of particles from a distance. Artificial intelligence is implemented in IoT devices to improve quality of life 
[1], [2]. Numerous CCTV cameras have recently been placed in various public and private spaces for 
security, traffic monitoring, and other purposes. These cameras can assist in spotting irregularities and taking 
appropriate action [3]. Anomaly detection is currently a hot topic for academics in a variety of fields, 
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including business, and industry [4]-[6]. The properties of both public and private assets can be protected and 
saved with the use of anomaly detection. Security-related uses for surveillance cameras include spotting odd 
behavior or abnormalities in both public and private spaces [7], [8]. Real-time video analysis and suspicious 
case detection need a IoT of human resources and are susceptible to errors due to a gradual decline in human 
attention. Several techniques define anomaly activities in the literature as "the occurrence of deviation in 
regular patterns." The process of choosing a function's decision variables so that the function is at its 
maximum or minimum value is known as optimization. Numerous real-world engineering issues fall under 
the category of optimization issues [9]-[11], whereby the decision variables are chosen so that the systems 
function at their most optimal position. These situations are typically discontinuous, nondifferentiable, 
multimodal, and nonconvex, making it impossible to use the traditional gradient-based deterministic methods 

[12]-[14]. In recent decades, a substantial quantity of randomized optimization algorithms [15]-[17] have 

indeed been created to address the limitations of classical methods. They are motivated mainly by biological 

behaviors or physical events. Unfortunately, most fundamental metaheuristic algorithms produce 
unsatisfactory results for complex optimization issues that arise in real-world settings. 

The squirrel search method (SSA), enthused by the lively scavenging elegance of flying squirrels, 
was developed [18]. Because SSA integrates a seasonal monitoring condition, it provides the recompenses of 
healthier and more effective search space investigation when compared to other algorithms. Additionally, the 
woodland section has three different tree species (normal, oak, and hickory), which preserves population 
variety and improves investigation. The performance of SSA is superior to other well-known like genetic 
algorithm (GA) [19], particle swarm optimization (PSO) [20], bat algorithm (BA) [21], and firefly algorithm 
(FF) [22]. Then, rather than feeding our classifier model one feature frame at a time, we integrate the 
characteristics of fifteen subsequent frames by adding up their values [23] weakly supervised techniques 
bidirectional long short-term memory (BiLSTM) are used to train our classifier model. BiLSTMs are quite 
beneficial when the background of the information is essential. Information travels from backward to forward 
in a unidirectional long short-term memory (LSTM). Instead, BiLSTM customs two concealed states to allow 
data to flow headlong and backward. As a consequence, BiLSTMs are more capable of comprehending the 
context [24]. The following are the primary accomplishments of the recent study: 

— The SSA uses a regular cloud generator to produce new positions for flying squirrels while they are 
gliding, which also improves the SSA's exploration capability; 

— A selection approach amongst successive positions is suggested to keep a flying squirrel individual in the 
best position possible throughout the optimization process, enhancing the algorithm's ability to exploit 
different locations. 

— Extracted features are subsequently employed for anomaly identification. Therefore we apply the 
BiLSTM, strengthening the local search capability. 

The researchers [25], [26] provided a novel method for describing a person's current behavior 
condition founded on the location and rapidity of the focus and its surroundings. They made use of the 
interaction energy potential function. This method used the relationship between a social conduct's action and 
energy potential to describe social behavior. They employed a support vector machine for this method to 
classify the unique energy action pattern as an anomaly. This suggested approach uses the connection 
between a person's present state and their responses. 

Direkoglu et al. [27] proposed a unique feature-based visual flow for the detection of abnormal 
crowd behavior. On the pixel level, it operates. This technique looks for unusual behavior using angle 
deviations at each pixel level. Additionally, it assesses the angle difference between the present frame and the 
preceding one. A straightforward one-class support vector machine is employed to identify typical behavior. 
Additionally, they ran tests using the UMN and PETS2009 datasets. The MAP framework was created by 
Li et al. [28] for anomaly identification utilizing prior information. Prior information is combined with the 
Bayesian framework to identify anomalies. The maximum grid template is used to calculate the likelihood 
function. In difficult circumstances, this experiment delivers incredibly useful results. 

Ullah et al. [29] described the pedestrian movement in a congested neighborhood and discovered 
abnormalities. For the discovery of strange objects and their localization in pedestrian flow, he suggested the 
Gaussian Kernel based integration model (GKIM) model. He computed the EER and DR between the grid 
and frame levels. The aberrant entity is found based on its placement in the framing. The human-related 
crime (HR-crime) dataset was used by Vu et al. [30] to figure out the feature removal pipeline for activities 
that classify human-related abnormalities. A benchmark investigation of HR crime outlier prediction is also 
provided. Majhi et al. [31] technique for handling abnormality in a single model uses a weakly supervised 
learning approach. I3D is many to many [32] LSTM was used for feature extraction, with an area under curve 
(AUC) value of 82.12%. 
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Robust temporal feature magnitude (RTFM) learning is a novel method proposed by Tian et al. [33] 
to considerably improve the resilience of the MIL strategy to undesirable events from abnormal films. To 
correctly identify the successful examples in RTFM, a characteristic magnitude learning algorithm must be 
trained. RTFM was carried out using the temporal feature magnitude of the video samples. Cao et al. [34] 
recommended using an adaptive graph convolutional network (GCN) to find video anomalies. The 
recommended technique builds a global graph while taking feature similarity into interpretation. 


2. METHOD 
2.1. Proposed workflow 
The stages of the planned method are as trails: 

— The SSA increases its capacity for exploration by using a regular cloud generator to provide new sites for 
flying squirrels while it is gliding. 

— To retain a flying squirrel personality's optimum position during the optimization process, a selection 
method between succeeding locations is suggested, which improves the algorithm's capacity to be 
exploited. Thus, the local search capability is strengthened using a search improvement technique. 

— Extracted features are then employed to identify anomalies, hence we adopt the BiLSTM architectural 
framework. Figure 1 is a visual representation of the methodology used in the proposed approach. It 
provides an overview of the steps involved in the proposed approach for anomaly detection in videos. 


Figure 1. Method 


2.2. Squirrel search optimization algorithm 

SSA impressionists the positive feeding behavior of flying squirrels in deciduous woods of Europe 
and Asia by gliding for long-distance migration [18]. Squirrels navigate the forest for food sources during 
warm weather by gliding from one tree to another. They can easily detect acorn nuts to meet their daily 
energy necessities. The best food supply kept for the winter, hickory nuts, are then sought for. Although they 
are less energetic in the winter, they store hickory nuts for energy. The activity of flying squirrels grows as 
the temperature improves. The procedure above is repeated and keeps going over the squirrels’ whole life 
span, forming the basis of the SSA. The following mathematical steps can be used to model the optimization 
SSA by the flying squirrels' method of food gathering. 


2.2.1. Initialize the algorithm parameters and establish the positions and sorting of flying squirrels 

The population size NP, the maximum number of iterations Itermax, the number of choice 
variables n, the likelihood of a predator's existence Pdp, the scaling factor sf, the gliding constant Gc, and the 
lower and upper limits for objective functions FSU and FSL are the chief constraints. These conditions are 
established at the start of the process. 
The search tempo initializes the flying squirrels' positions at randomized as (1): 


FS,,; = FS, + rand () * (FSy — FS;,), (1) 
i = 1,2,...,NP,j = 1,2,...,n 


where rand() revenues an arbitrary quantity. 
Adding the value of choice variables into a fitness function yields the fitness value 
f=(1f2,..., fNP) of a particular flying squirrel's location: 


fi = ThE Sey FSiz GF Sig) i = 1,2,..., NP (2) 


Then, the suitability rate of the flying squirrels’ positions is used to rank the food excellence sources in 
increasing order: 
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Tri sorts of trees are categorized subsequently categorizing the food foundations of each flying 
squirrel's habitat: hickory trees (which are a source of hickory nuts), oak trees (which are a source of acorn 
nuts), and regular trees. The best food source (i.e., the one with the lowest fitness value) is believed to be 
located in a hickory nut tree (FSht), the subsequent tri food bases are made-up to be located of acorn nut 
trees (FSat), and the remaining sources are supposed to be located in standard trees (FSnt): 


FSht = FS (sorte index (1)) (4) 
FS at (1: 3) = FS (sorte index (2: 4)) (5) 
FS nt (1: NP — 4) = FS (sorte index (5: NP) (6) 


2.2.2. Produce new locations through gliding 

After the flying squirrels’ procedure, three situations might manifest. 

Scenario 1: Hickory nut trees are likely to be approached by flying squirrels on acorn nut trees. The 
following is a method for creating the new locations: 


FSg4,dgG,(FSpi" — FSgt*),if Ry = Pap 
random location , otherwise 


rey (7) 


R1 is a purpose that returns a number, dg is the random gliding distance, and Gc is the gliding constant. 
Scenario 2: approximately accumulators on regular trees would shift to an acorn nut tree to get their daily 
liveliness necessities met. The resulting is a method for creating the new locations: 


FSpE dg Ge (FSR! — FS4) if R, = Pay 
random location , otherwise 


Roar (8) 


where R2 is a function that takes an input from the [0, 1] range and produces a value. 

Scenario 3: approximately hovering collectors on regular trees might go to a hickory nut tree if their daily 
energy needs have been met. The following formula can be used to determine the new position of squirrels in 
this scenario: 


FSR! dgGe(FSRit — FSR!) if R3 > Pap 
random location , otherwise 


rsa =| (9) 


where R3 is a purpose that yields a value on the range [0, 1] from a uniform distribution. 

Gliding distance dg is assumed to be between 9 and 20 meters in all circumstances [18]. The 
algorithm may perform poorly because this value is relatively huge and may introduce significant 
disturbances. A scaling factor (sf), whose value is set to 18 [18], is inserted as a divisor of dg in order to 
achieve the algorithm's desired performance. 


2.2.3. Examine the current state of the programme and bring to an end criterion 
Seasonal changes Sc have a big impact on flying squirrels' foraging habits. 


2 
st= GOA — FSrek) » t = 1,2,3 (10) 


10E-6 
Semin = 3¢glter/Iterman/25 CH) 


Then, the status of the seasonal monitoring is examined. When Si<Scmin occurs, the winter is ended. 

FSi” = FS, + Lévy (n) x (FSU — FSL) (12) 
where Lévy is distribution. 

The algorithm terminates if the predetermined quantity of repetitions is reached. If not, the actions 


of developing new sites and assessing how the periodic tracking is going are reiterated. 
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2.3. BiLSTM 

Recurrent neural networks (RNNs) are a type of backward-connected network in which output from 
one layer is sent back [35]. RNNs preserve state by using the results of calculations made in a prior timestep 
in the current timestep. RNNs are frequently employed as models where values at earlier time steps might 
influence the current scheming [36]. 

As shown in Figure 2, there are two methods for training the BiLSTM memory block: one uses data 
from the past and present at various points in time, while the other uses data from the future and past. Each 
layer computes the subsequent function for each constituent in the input sequence. The variable represented 
by LSTM cell is indicated by (13) to (18): 


Input gate: I = o(W,x, + A;hy_1 + bi) (13) 
Forgot gate: P, = o(Wpx, + Aph,_, + bp) (14) 
Output gate: O, = o(Wox; + Aght_-1 + bo) (15) 
New memory cell: cg = W;Ct-1 + It (16) 
Final memory cell: ¢, = tanh(Wcex, + Wehy_1 + be) (17) 
Final output: h, = O, tanh (c+) (18) 


Where W,, Wp, Wo, and Wc represent input weight vectors, while A;, Ap, Ag, and Ac represent upper output 
weight vectors. Then b signifies bias vectors; o=sigmoid function. 

Since one-way memory is used, the findings of the uni-directional LSTM technique contain certain 
mistakes [37]—[39]. The BiLSTM approach, an extension of the conventional LSTM approach, can enhance 
performance in sequence classification (future to past). The two concealed LSTM layers are linked to the 
output by BiLSTM. This encourages improving the long-term learning reliance, consequently enhancing the 
model performance. According to a previous study, bidirectional networks are demonstrably superior to 
regular ones in several areas, including the classification of anomaly video cases. Figure 3 depicts the 
structure of BiLSTM. 


Figure 2. LSTM model Figure 3. The unfolded architecture of BiLSTM 


The recessive LSTM layer output is produced employing the inverted contributions from time t—1 to 
t-n, Just as the forward output waveform is acquired similarly to the unidirectional one. After being supplied 
to the function, these output sequences were combined into an output vector called yt [40]. The ultimate 
result can be characterized by a vector, Yt=[yt-n, ... , yt-1]. The suggested system's thorough procedure is 
depicted in Figure 4. 

The preprocessing of the data and feature extraction using the SSA based on Algorithm 1 are the 
two steps that make up the SSA-based BiLSTM technique. Merely said, frame recognition is a crucial 
component of frame categorization since it allows for the accurate location and measurement of an object 
within an image [41]. Frame localization establishes the position and dimensions of a body [42]. After that, 
anomalous behavior in the video is discovered using BiLSTM classification of the extracted features. 
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Figure 4. Workflow of the proposed system 


Algorithm 1. 


Set the parameters Itermax, 


NP, n, Pa, sf, Ge, FSu and FS. 


initialize locations by equation (1) 
Compute suitability value by (2) 


While Iter < Itermax 


Generate new locations using (3-6) 


for t = 1: nl 


if Ri 2 Pap 
Execute condition 
else 

Execute condition 
end 

end 

for t =1: n2 

if R22 Pap 
Execute condition 
else 

Execute condition 
end 

end 

for t = 1: n3 

if R32 Pap 
Execute condition 
else 

Execute condition 
end 

end 


(i) of equation (7) 


(ii) 


(i) of equation (8) 


(ii) 


(i) of equation (9) 


(ii) 


Check Seasonal Monitoring Condition 
Compute the suitability value of new locations 


Iter = Iter + 1 
end 
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3. RESULTS AND DISCUSSION 
3.1. Dataset and evaluation metric 

The experiment uses the avenue dataset, which has 16 training and 21 testing video clips. 
TensorFlow, a framework for neural network training, was utilized by us [43]. The RNN model's output 
layer, which divides the entire dataset into two categories—threat and safe consists of just two neurons. The 
videos included contain several offensive sequences and have not been altered. There are 940 unshuffled 
frame chunks, extracted in 15 frame batches per 1 second. They are reduced via preprocessing and feature 
extraction. It is discussed how the present research is compared to the suggested method and evaluated. 


3.2. Receiver operating characteristic curve 

Eighty percent of the data samples are used for the BiLSTM classifier's training, and the remaining 
twenty percent are used for validation. The receiver operating characteristic (ROC) curve can determine the 
classifier's overall performance. As seen in Figure 5, the BiLSTM classifier outdoes the LSTM (labeled 
"LSTM" in the legend) in AUC values. The steps for calculating the ROC and AUC curves are as follows. 
The negative log of the sequence probability has been computed for each sequence in the validation data for 
different values: 
— The sequence is categorized as an attack (positive) if the negative log value is higher than the threshold, 

else it is categorized as normal (negative). 

— The sequence is designated as TP, FP, TN, or FN. 
— A plot of the ROC curve is shown for various threshold levels. 
— The ROC curve's AUC value is determined. 


1.0 


0.6 


True Positive Rate 


Pa —— LSTM (AUC = 0.865) 
"d —— BiLSTM (AUC = 0.931) 


“0.0 0.2 0.4 0.6 0.8 1.0 
False Positive Rate 


Figure 5. ROC curve for our classifier model 


The ROC curve shown demonstrates that compared to other approaches, our method achieved the 
best AUC of 93.1%. When the AUC value is near 1, the model has a decent capacity to distinguish between 
normal and anomalous data. 


3.3. Quantitative results analysis 

Accuracy, precision, recall, and F1 measure are the metric characteristics used for evaluation and 
judgment. Confusion matrix is composed of accuracy=(TP+TN)/(TP+FN+FP+TN), precision=TP/(TP+FP), 
recall=TP/(TP+FN), and F1=2* precision*recall/(precision+recall). The TP, TN, FP, and FN denote true 
positive, true negative, false positive, and false negative in classification results. In comparison to other 
algorithms now in use, such as CNN, F-CNN, LSTM, and the suggested method. Table 1 and Figure 6 give a 
good image of the best accuracy rate and reduced complexity. 


Table 1. Comparing key metrics 
Methods Accuracy Precision Recall F1 


CNN 89.7 84.9 81.6 82.5 
F-CNN 92.6 87.2 84.6 87.4 
LSTM 96.8 97.3 85 96.5 
Bi-LSTM 98.2 97.3 95 96.5 
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Figure 6. Comparative analysis graph 


4. CONCLUSION 

This research proposes a unique SSA-BiLSTM approach for anomaly identification in video. The 
SSA is a brand-new method for global search optimization that is based on how flying squirrels find food. 
The flying behavior of the flying squirrel population is described as being random and fuzzy using the typical 
cloud model. The selection approach improves a flying squirrel's capacity for local search. Additionally, 
improving dimensional search produces better iterations of the optimal answer. BiLSTM is used in extensive 
comparison research to look for anomalies in datasets. ROC curve and AUC calculations are made to 
demonstrate the BiLSTM's superior performance in anomaly identification. Due to the BiLSTM network's 
design, which allows input to flow in both directions to retain past and future data, the BiLSTM performs 
better than the uni-directional LSTM. The combination of SSA and BiLSTM results in better anomaly event 
detection accuracy. The proposed framework can be integrated with future techniques for scaling, feature 
selection algorithms, and dimensionality reduction algorithms. 
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